Created: Dilnoza Muslimova
Contact: muslimova@ese.eur.nl
Date: 8-11-2024

This folder contains step by step code for cleaning of the UK Biobank genetic data, GWAS, GWAS quality control, polygenic score construction, 
and the rest of the analysis in "Gene-environment complementarity in educational attainment." by Muslimova et al., 2024.
 
NOTE: 1. Naming rule for the scripts: stepnumber_processtype_copyoriginalnameinserver
      2. All scripts used for genetic data have been run on cluster computing using .bash, R, python, etc as specified in the script. 


FOLDER 1_cleaning_ukb_genetic_data_clustercomputing

0. "dd" - Download UKB files 

1. "gd" - Genetic data 
Note: here we QC the original genetic data, apply MAF filters, remove duplicates, remove non-consented individuals 
      
2. "rp" - Script on creating the reference panel in the UKB, used for running all GWAS

3. "sib" - Identifying siblings for separating the discovery and holdout samples in the UKB
	Rscript uses the UKB kinship file to identify siblings and their relatives and stack them from the pairwise data 
	Rscript also identifies parent child and third degree relatives 
	.do script(STATA) shows how to retrieve the kinship data 
	.do script(STATA) "Siblings and their relatives" structures the relationship networks per individual participant 


FOLDER 2_ea_gwas_clustercomputing

4. "gwas" - GWAS (EA). For all GWAS, we first create residualized phenotypes in STATA and then feed this phenotype to fastGWA 
4.1. EA UKB GWAS excluding siblings and their relatives
	 Phenotype construction code (.do)
	 GWAS code  
     EA UKB Split sample GWAS excluding siblings and their relatives 
	 Phenotype construction code (.do)
	 GWAS code 

5. "qc" - QC sumstats - one generic scripts applied to all sumstats 
General Easy QC code
QC caller 

FOLDER 3_constructing_ea_pgs_clustercomputing
6. "eapgs" - EA PGSs
6.1. EA UKB GWAS excluding siblings and their relatives
6.2. EA UKB Split sample GWAS excluding siblings and their relatives 
7. "imp" - Importing PGS to stata

+folder snipar_parental_genotypes 
	code for imputing parental genotypes and computing their pgs

STATA files 
8. analysis sample and table comparing analysis sample to the extended sample 
9. code for figures in the main text 
10. code for tables in the main text 
11. code for tables and figures in the appendix
 